Hao Fei

Senior Research Fellow

School of Computing, National University of Singapore
3 Research Link, Singapore 117602

Profile

I am a senior research fellow at National University of Singapore, working with Prof. Mong-Li Lee and Prof. Wynne Hsu at IDS, also with Prof. Tat-Seng Chua at NExT++. Previously, I was an associate researcher at Skywork AI Singapore, working with Prof. Shuicheng Yan (more previously an associate researcher at SEA AI lab). I graduated as Ph.D from Wuhan University.

My research has been published in top-tier ML/NLP/CV/MM venues, e.g., ICML, NeurIPS, ACL, CVPR, AAAI, WWW, SIGIR, IJCAI, EMNLP, ACM-MM, TPAMI, TKDE, TOIS, TNNLS, TASLP. I was awarded the World AI Conference Rising Star in 2023. My papers were selected as Most Influential Papers by Paper Digest, and ESI Highly Influential Papers and 2024 WAIC Outstanding Paper Award. I was also the recipient of the 2023 WAIC Rising Star award, and ranked as Top 2% Scientists Worldwide 2024 (Single Year) by Stanford University. I’ve regularly served as (Senior) Area Chair or Senior Program Committee of top-tier conferences. I was the organization committee of WSDM 2022, EMNLP 2023, ACL 2024, ACM MM 2025. I serve as the Associate Editor of some journals, including TALLIP and Neurocomputing. And I am a persistently-invited reviewer for many journals including TPAMI, IJCV, TNNLS, TKDE, TOIS, etc. My Ph.D thesis was awarded the Excellent Doctoral Thesis of Chinese Information Processing Society (CIPS). I won more than ten honors and awards during Ph.D stage.

Research

My research interests lie in the NLP, CV, and the intersection of both (i.e., Multimodal/Vision-Language Learning). My long-term goal is to achieve human-level AI centering around multimodal LLMs & generalists. While previously I worked a lot on the topic of Structural Modeling of Language&Vision, I pay the most recent focus on the unified multimodal generalist towards human-level capacity (Modality, Task, Knowledge) and cognition (Reasoning, Affection), with following key topics and representative works (detailed in research statement):

▶  Multimodal Foundation Models: Unified multimodal LLMs and generalists.

  • NExT-GPT:      The 1st unified any-to-any multimodal LLM
  • Vitron:      The 1st unified pixel-level vision LLM for understanding, generating, segmenting, editing of image and video
  • General-Level:      Pioneer the path of MLLM evaluations towards multimodal generalists
  • MLLM tutorial:      A pioneering & comprehensive tutorial series for MLLM techniques

▶  Capacity: Comprehension/generation of modalities/tasks, knowledge acquisition.

  • JavisDiT:    A novel Diffusion Transformer for synchronized audio-video generation
  • Any2Caption:    A SoTA video generation framework from any input conditions
  • Dysen-VDM:      Enhance temporal dynamics of text-to-video diffusion from LLMs
  • LayoutLLM-T2I:      Enhance fidelity of text-to-image diffusion with layout from LLMs
  • MUIE:      The 1st benchmark for grounded multimodal universal information extraction

▶  Cognition: Cross-modal neuro-symbolic reasoning, human-centric affective computing.

  • MCoT-Survey:    The 1st systematic survey of MCoT reasoning
  • Video-of-Thought:      The 1st video chain-of-thought reasoning framework
  • SymbCoT:      The 1st fully LLM-based logical reasoning framework based on chain-of-thought
  • THOR-ISA:      The 1st chain-of-thought reasoning framework for implicit sentiment analysis
  • PanoSent:      The 1st cognitive-level benchmark for multimodal conversational aspect-based sentiment analysis
  • AvaMERG:    The 1st avatar-based multimodal empathetic conversation benchmark

Advertising

I am constantly looking for collaborations on the above topics. Remote manner is also supported. For promising students I will provide sufficient GPUs. Hit me up, if you are a Ph.D/master/bachelor student and interested in what I am doing now. When you are from Chinese universities, there are also potential vacancies for research interns (e.g., self-/CSC-funded joint PhD project). Please describe your research status and attach your resume.

News

  10 Apr 2025

We are holding the grand challenge of Multimodal Conversational Aspect-based Sentiment Analysis (PanoSent) and Avatar-based Multimodal Empathetic Conversation (AvaMERG) at ACM Multimedia 2025, Call for Participation!

  5 Apr 2025

We are holding the first MLLM for Unified Comprehension and Generation (MUCG 2025) workshop and the first Cognition-oriented Multimodal Affective and Empathetic Computing (CogMAEC 2025) workshop at ACM Multimedia 2025, Call for papers!

  27 Mar 2025

We are holding the first Multimodal Knowledge and Language Modeling (MKLM 2025) workshop at IJCAI 2025, Call for papers!

  24 Mar 2025

We are releasing the first survey on Multimodal Chain-of-Thought Reasoning, check it now at Github!

  27 Feb 2025

Two papers are accepted by CVPR 2025, 1) Universal Scene Graph Generation and 2) 4D Scene Graph Generation. Congrats to all my co-authors!

  8 Feb 2025

One paper about Multimodal Grammar Induction is accepted by Journal of Artificial Intelligence!

  22 Jan 2025

Two papers are accepted by ICLR 2025, 1) Semantic-equivalent Tokenization and 2) Cross-modal DPO. Congrats to all my co-authors!

... see all News